Reconstruct high-resolution 3D genome structures for diverse cell-types using FLAMINGO

High-resolution reconstruction of spatial chromosome organizations from chromatin contact maps is highly demanded, but is hindered by extensive pairwise constraints, substantial missing data, and limited resolution and cell-type availabilities. Here, we present FLAMINGO, a computational method that addresses these challenges by compressing inter-dependent Hi-C interactions to delineate the underlying low-rank structures in 3D space, based on the low-rank matrix completion technique. FLAMINGO successfully generates 5 kb- and 1 kb-resolution spatial conformations for all chromosomes in the human genome across multiple cell-types, the largest resources to date. Compared to other methods using various experimental metrics, FLAMINGO consistently demonstrates superior accuracy in recapitulating observed structures with raises in scalability by orders of magnitude. The reconstructed 3D structures efficiently facilitate discoveries of higher-order multi-way interactions, imply biological interpretations of long-range QTLs, reveal geometrical properties of chromatin, and provide high-resolution references to understand structural variabilities. Importantly, FLAMINGO achieves robust predictions against high rates of missing data and significantly boosts 3D structure resolutions. Moreover, FLAMINGO shows vigorous cross cell-type structure predictions that capture cell-type specific spatial configurations via integration of 1D epigenomic signals. FLAMINGO can be widely applied to large-scale chromatin contact maps and expand high-resolution spatial genome conformations for diverse cell-types.

. Overview of the assembly algorithm of FLAMINGO. After FLAMINGO reconstructs the intra-domain structures along each chromosome, the adjacent domains, e.g. 1 and 2 , are iteratively rotated to match with the observed inter-domain pairwise distances between 5kb DNA fragments from different domains (i.e. off-diagonal points in the matrix). FLAMINGO searches for the optimal 3D Givens rotations of the x-axis through adjacent domains, and then search for optimal Givens rotations of the y-and z-axis. Through iterations, FLAMINGO will identify the approximate optimal 3D rotations for each domain to minimize the differences between the reconstructed and observed inter-domain distances. Upon convergence, the optimally rotated domains are assembled to generate the final chromosome structures.     Long-range ChIA-PET interactions (red links) link a distal enhancer (orange) to the KCNA2 promoter (red). Both anchors are bound by CTCF and Rad21. In 3D space, FLAMINGO predicts a short 3D distance (0.083) between the two anchors, despite the long 1D genomic distance (130kb). In comparison, Hierarchical3DGenome predicts the 3D distance to be 0.128 between the two anchors. (b) Long-range ChIA-PET interactions (red links) link a distal enhancer (orange) to the PAFAH2 promoter (red). Both anchors are bound by CTCF and Rad21. In 3D space, FLAMINGO predicts a short 3D distance (0.103) between the two anchors, despite the long 1D genomic distance (103kb). In comparison, Hierarchical3DGenome predicts the 3D distance to be 0.132 between the two anchors. (c) Long-range ChIA-PET interactions linking a distal enhancer to the TESK2 promoter. Both anchors are bound by CTCF and Rad21. FLAMINGO predicts a short 3D distance (0.092) between the two anchors, although the 1D genomic distance is 131kb. In comparison, Hierarchical3DGenome predicts the 3D distance to be 0.143.

Anchor 1
Anchor 2    Figure 18. FLAMINGO robustly reconstructs the high-resolution 3D structures using a small fraction of observed Hi-C data. (a) FLAMINGO identifies the long-range chromatin interactions (highlighted by black arrows) based on down-sampled input matrix from Hi-C data (down-sampling rate=50%) and recovers the high-resolution complete distance matrix (top), while Hierarchical3DGenome fails to discover these highresolution structural features. The identified long-range chromatin interactions are supported by the bindings of CTCF and cohesin.

Algorithm to align 3D structures
The predicted 3D structures from different algorithms need to be aligned with each other. To enable direct visual comparisons between predicted structures, we rotate and align the 3D structures for visual comparisons using the following method. Given a 0-centered reference structure ( ) and a 0-centered query structure ( ) with points, the optimal rotation matrix ( ) from the query structure to the reference structure is calculated as T , where and are calculated from the singular vector decomposition of matrix T . The rotated query structure is calculated as = , which optimally aligns with the reference structure. 2

Performance validation using FISH data
The FISH data experimentally measures the 3D coordinates of TADs, the typical size of which are hundreds of kilobases. Therefore, the FISH data provides additional evidence to support the predictive accuracy of FLAMINGO. The 5kb DNA fragments located at the center of each TAD are used to represent the predicted 3D coordinates of TADs. The Spearman correlations between the observed structures from the FISH data and the predicted structures by FLAMINGO are calculated to quantify the model performance.

Notes on model comparison
We compare FLAMINGO with seven state-of-the-art algorithms: PASTIS 9 , RPR 10 , GEM- ShRec3D is an MDS-based approach to reconstruct 3D genome structures based on the shortest-path distances, which is generated by applying the shortest-path algorithm (i.e. the Floyd-Warshall algorithm) on the observed distance matrix from Hi-C. The ShRec3D software is downloaded from GitHub (https://GitHub.com/kpj/ShRec3D). The observed distance matrix is created based on the conversion factor =0.25 from Hi-C data, which is the same as FLAMINGO. The missing data in the observed distance matrix is filled with 0 as instructed.
ShRec3D can complete predictions for chromosomes 13-22 within three days.
ShNeigh is an improved MDS-based method by modeling the 3D proximity of neighboring points. The stand-alone package of ShNeigh is downloaded from Github (https://github.com/fangzhen-li/ShNeigh) and tested following the instruction. ShNeigh can finish the prediction of chromosomes 15-22 and X. Hierarchical3DGenome combines the optimization of the Lorentzian function and a hierarchical prediction strategy to predict 3D structures. The software is downloaded from GitHub (https://GitHub.com/BDM-Lab/Hierarchical3DGenome) and tested as instructed.
Hierarchical3DGenome can complete predictions for all 23 chromosomes.
To directly compare the predicted 3D distances between DNA fragments from different models, we unify the predicted 3D coordinates as: unify = /|| || F , where is the predicted 3D coordinates and || || is the Frobenius norm of the coordinate matrix. The 3D distances based on the unified coordinates can be directly compared with each other and are used for downstream analyses.

CTCF motif identification
The CTCF motif hits along the human genome are collected from the motif browser 16 using the confidence score threshold of 0.3. The control motifs are filtered out. The directions of CTCF motifs are decided by aligning DNA sequences of motif hits and the CTCF motif using TOMTOM 17 .